Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Clickhouse mode of study view #11224

Open
wants to merge 132 commits into
base: master
Choose a base branch
from
Open

Clickhouse mode of study view #11224

wants to merge 132 commits into from

Conversation

alisman
Copy link
Contributor

@alisman alisman commented Nov 22, 2024

Fix # (see https://help.github.com/en/articles/closing-issues-using-keywords)

Describe changes proposed in this pull request:

  • a
  • b

Checks

Any screenshots or GIFs?

If this is a new visual feature please add a before/after screenshot or gif
here with e.g. Giphy CAPTURE or Peek

Notify reviewers

Read our Pull request merging
policy
. It can help to figure out who worked on the
file before you. Please use git blame <filename> to determine that
and notify them either through slack or by assigning them as a reviewer on the PR

@@ -181,7 +181,7 @@
</if>
</if>
</where>
Group by clinical_event.EVENT_TYPE, patient.STABLE_ID
Group by clinical_event.EVENT_TYPE, patient.INTERNAL_ID
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@haynescd this is one fix i did. i think we can keep it. the stable id is not unique across study

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

agreed

@@ -223,6 +263,197 @@ public Pair<List<CopyNumberCountByGene>, Long> getPatientCnaGeneCounts(List<Mole
);
}

@Override
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@haynescd what's all this stuff?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the stuff for the new clickhouse implementation

@@ -0,0 +1,32 @@
DROP TABLE IF EXISTS sample_list_columnstore;
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@haynescd we can kill this file, right?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed

@@ -121,6 +121,7 @@
<include refid="selectGenePanelData"/>
<include refid="fromGenePanelData"/>
WHERE
SAMPLE_ID IS NOT NULL AND
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@haynescd this is one we might want to undo because it changes profile counts. this is what allows the system to recover from incomplete sample_profile table issue.

@@ -32,7 +32,7 @@
window.netlify = localStorage.netlify;

if (window.localdev || window.localdist) {
window.frontendConfig.frontendUrl = "//localhost:3000/"
window.frontendConfig.frontendUrl = "https://localhost:3000/"
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

might possible break localdb tests

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yea, probably

haynescd and others added 25 commits November 23, 2024 23:28
* Add Columnar SQL file to init Clickhouse DB

* Refactored Mapper xml to extract StudyViewFilterMapper
* ✅ Add Unit test for StudyViewMapper Clickhouse

* ✅ Update db props to include mysql and clickhouse datasources to fix tests

* Address comments

* Rename package to clickhouse

* Update to static final

* Use bean name instead of qualifier
* Create new wide table sql file and rename package

* Remove genomic_event view

* Add AlterationFilter to mutated_genes endpoint

* Add AlterationFilter to mutated-genes endpoint

* Fix unit test

* Fix sonar issues

* Add test for mutation types and status

* remove unused imports
* add missing poc clinical data binning function
* Add sample_mv materialized view and use it in mappers
* Add Support for TotalProfiledCase Counts for Mutated-genes endpoint.

* Create sql files to create new tables

* Add unit test for totalProfiledCount

* Add matching gene panel ids

* Add TotalProfiledCountsWithoutPanelData

* Add profileCount for genes without gene panel data

* Add Comments for SQL

* Update matching Gene Panel Ids

* Clean up code

* Fix test

* Add query to get correct Gene Panels

* Fix unit test

* Add comments
* working poc

* refactor logic into service, so clean

* refactor for parameters builder, simplify min max logic, streamline service call

* remove unused services and imports

* remove more unused imports
* Implement molecular profile count endpoint using Clickhouse

* Cleanup
* ✨ Add CNA Gene Endpoint

* 🐛 Fix StudyViewFilterMapper.xml to allow ability to filter on gene and alteration

* Fix merge conflict

* Address comments

* Fix unit tests

* Fix sonar issues
* ✨ Add StructuralVariant-genes endpoint

* Fix sonar issues

* Update MatchingGenePanel request to return list

* Create and use sample_derive

* Update where sample_derived is stored to fix unit test
* use clinical_data_derived instead of sample_clinical_attribute_numeric_mv and patient_clinical_attribute_numeric_mv

* use clinical_attribute_meta instead of sample_clinical_attribute_numeric_mv and patient_clinical_attribute_numeric_mv

* remove unused clinical data count methods and SQL

* fix numericalClinicalDataCountFilter

* Move CategoricalClinicalAttributeFilter to repository

* remove unused columns

* Add override to methods

---------

Co-authored-by: haynescd <[email protected]>
…0857)

* Add patient_id column to genomic_event_derived

* Update sql to convert list of patients to list of samples
* refactor to use clickhouse

* filter out empty attr values

* edit comment

* fix sonarcloud issues

* use parallel stream, shaves off 5s

* use newer mapping annotation
@@ -14,7 +14,9 @@
import java.util.stream.Collectors;

@Component
@ConditionalOnProperty(name = "persistence.cache_type", havingValue = "redis")
@ConditionalOnExpression(
"#{environment['persistence.cache_type'] == 'redis' or environment['persistence.cache_type_clickhouse'] == 'redis'}"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should remove this and be enable caching or disable caching for the whole system... going forward

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

agreed. this is so that we can assess performance without cheating with cache. we need caching on for legacy because otherwise, the product is totally unusable!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

and at the same time, when we deploy for demonstration purposes, we want initial load of studyview to compete with legacy, i.e. cache ON.

@@ -19,7 +20,9 @@
import java.util.stream.Collectors;

@Service
@ConditionalOnProperty(name = "persistence.cache_type", havingValue = {"redis"})
@ConditionalOnExpression(
"#{environment['persistence.cache_type'] == 'redis' or environment['persistence.cache_type_clickhouse'] == 'redis'}"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same as above... Should we still separate these?

zainasir and others added 2 commits November 25, 2024 10:34
* Hide select properties of ClinicalDataFilter from frontend

* Update swagger decorators on clickhouse controller
@alisman alisman changed the title Clickhouse api Clickhouse mode of study view Nov 25, 2024
onursumer and others added 18 commits November 25, 2024 18:03
…#11155)

* Add patient level filtering for aggregation

* Patient level filtering works for non-NA

* Categorical patient level filtering & clean up

* Use new generic assay table schema
…y clickhouse_enabled is set (#11256)

* Update cBioPortal to dynamically load ch Components only when property clickhouse_enabled is set
* Update env var for circleCi
* Merge genomic data bins working
* Workaround for clickhouse bug in numerical data parsing

---------

Co-authored-by: alisman <[email protected]>
* fix CNA query for genomic data filter

* rename one of the cna_query statements to cna_count_query to avoid table name clash
Copy link

sonarcloud bot commented Dec 11, 2024

Quality Gate Failed Quality Gate failed

Failed conditions
4 New Bugs (required ≤ 0)

See analysis details on SonarQube Cloud

Catch issues before they fail your Quality Gate with our IDE extension SonarQube for IDE

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

9 participants